Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 51
Filter
1.
BMC Bioinformatics ; 24(1): 232, 2023 Jun 05.
Article in English | MEDLINE | ID: covidwho-20234026

ABSTRACT

BACKGROUND: Recent epidemic outbreaks such as the SARS-CoV-2 pandemic and the mpox outbreak in 2022 have demonstrated the value of genomic sequencing data for tracking the origin and spread of pathogens. Laboratories around the globe generated new sequences at unprecedented speed and volume and bioinformaticians developed new tools and dashboards to analyze this wealth of data. However, a major challenge that remains is the lack of simple and efficient approaches for accessing and processing sequencing data. RESULTS: The Lightweight API for Sequences (LAPIS) facilitates rapid retrieval and analysis of genomic sequencing data through a REST API. It supports complex mutation- and metadata-based queries and can perform aggregation operations on massive datasets. LAPIS is optimized for typical questions relevant to genomic epidemiology. Using a newly-developed in-memory database engine, it has a high speed and throughput: between 25 January and 4 February 2023, the SARS-CoV-2 instance of LAPIS, which contains 14.5 million sequences, processed over 20 million requests with a mean response time of 411 ms and a median response time of 1 ms. LAPIS is the core engine behind our dashboards on genspectrum.org and we currently maintain public LAPIS instances for SARS-CoV-2 and mpox. CONCLUSIONS: Powered by an optimized database engine and available through a web API, LAPIS enhances the accessibility of genomic sequencing data. It is designed to serve as a common backend for dashboards and analyses with the potential to be integrated into common database platforms such as GenBank.


Subject(s)
COVID-19 , Monkeypox , Humans , SARS-CoV-2/genetics , Genome , Genomics
2.
Genes (Basel) ; 14(4)2023 03 31.
Article in English | MEDLINE | ID: covidwho-2323545

ABSTRACT

Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated proteins (Cas) are promising molecular diagnostic tools for rapidly and precisely elucidating the structure and function of genomes due to their high specificity, programmability, and multi-system compatibility in nucleic acid recognition. Multiple parameters limit the ability of a CRISPR/Cas system to detect DNA or RNA. Consequently, it must be used in conjunction with other nucleic acid amplification techniques or signal detection techniques, and the reaction components and reaction conditions should be modified and optimized to maximize the detection performance of the CRISPR/Cas system against various targets. As the field continues to develop, CRISPR/Cas systems have the potential to become an ultra-sensitive, convenient, and accurate biosensing platform for the detection of specific target sequences. The design of a molecular detection platform employing the CRISPR/Cas system is asserted on three primary strategies: (1) Performance optimization of the CRISPR/Cas system; (2) enhancement of the detection signal and its interpretation; and (3) compatibility with multiple reaction systems. This article focuses on the molecular characteristics and application value of the CRISPR/Cas system and reviews recent research progress and development direction from the perspectives of principle, performance, and method development challenges to provide a theoretical foundation for the development and application of the CRISPR/CAS system in molecular detection technology.


Subject(s)
CRISPR-Cas Systems , DNA , CRISPR-Cas Systems/genetics , RNA , Genome
3.
Lancet Microbe ; 4(6): e395, 2023 06.
Article in English | MEDLINE | ID: covidwho-2292727
4.
Asian Pac J Allergy Immunol ; 40(4): 422-434, 2022 Dec.
Article in English | MEDLINE | ID: covidwho-2289003

ABSTRACT

BACKGROUND: Neanderthals were a species of archaic humans that became extinct around 40,000 years ago. Modern humans have inherited 1-6% of Neanderthal DNA as a result of interbreeding. These inherited Neanderthal genes have paradoxical influences, while some can provide protection to viral infections, some others are associated with autoimmune/auto-inflammatory diseases. OBJECTIVE: We aim to investigate whether genetic variants with strong detrimental effects on the function of the immune system could have potentially contributed to the extinction of the Neanderthal population. METHODS: We used the publically available genome information from an Altai Neanderthal and filtered for potentially damaging variants present in genes associated with inborn errors of immunity (IEI) and checked whether these variants were present in the genomes of the Denisovan, Vindija and Chagyrskaya Neanderthals. RESULTS: We identified 24 homozygous variants and 15 heterozygous variants in IEI-related genes in the Altai Neanderthal. Two homozygous variants in the UNC13D gene and one variant in the MOGS gene were present in all archaic genomes. Defects in the UNC13D gene are known to cause a severe and often fatal disease called hemophagocytic lymphohistiocystosis (HLH). One of these variants p.(N943S) has been reported in patients with HLH. Variants in MOGS are associated with glycosylation defects in the immune system affecting the susceptibility for infections. CONCLUSIONS: Although the exact functional impact of these three variants needs further elucidation, we speculate that they could have resulted in an increased susceptibility to severe diseases and may have contributed to the extinction of Neanderthals after exposure to specific infections.


Subject(s)
Neanderthals , Humans , Animals , Neanderthals/genetics , Genome , Genome, Human , Membrane Proteins/genetics
5.
Int J Biol Macromol ; 238: 124054, 2023 May 31.
Article in English | MEDLINE | ID: covidwho-2252112

ABSTRACT

Clustered regularly interspersed short pallindromic repeats (CRISPR) and CRISPR associated proteins (Cas) system (CRISPR-Cas) came into light as prokaryotic defence mechanism for adaptive immune response. CRISPR-Cas works by integrating short sequences of the target genome (spacers) into the CRISPR locus. The locus containing spacers interspersed repeats is further expressed into small guide CRISPR RNA (crRNA) which is then deployed by the Cas proteins to evade the target genome. Based on the Cas proteins CRISPR-Cas is classified according to polythetic system of classification. The characteristic of the CRISPR-Cas9 system to target DNA sequences using programmable RNAs has opened new arenas due to which today CRISPR-Cas has evolved as cutting end technique in the field of genome editing. Here, we discuss about the evolution of CRISPR, its classification and various Cas systems including the designing and molecular mechanism of CRISPR-Cas. Applications of CRISPR-Cas as a genome editing tools are also highlighted in the areas such as agriculture, and anticancer therapy. Briefly discuss the role of CRISPR and its Cas systems in the diagnosis of COVID-19 and its possible preventive measures. The challenges in existing CRISP-Cas technologies and their potential solutions are also discussed briefly.


Subject(s)
COVID-19 , Gene Editing , Humans , Gene Editing/methods , CRISPR-Cas Systems/genetics , COVID-19/genetics , Genome
6.
Genes (Basel) ; 13(9)2022 09 10.
Article in English | MEDLINE | ID: covidwho-2236662

ABSTRACT

Genetic variation has been widely covered in literature, however, not from the perspective of an individual in any species. Here, a synthesis of genetic concepts and variations relevant for individual genetic constitution is provided. All the different levels of genetic information and variation are covered, ranging from whether an organism is unmixed or hybrid, has variations in genome, chromosomes, and more locally in DNA regions, to epigenetic variants or alterations in selfish genetic elements. Genetic constitution and heterogeneity of microbiota are highly relevant for health and wellbeing of an individual. Mutation rates vary widely for variation types, e.g., due to the sequence context. Genetic information guides numerous aspects in organisms. Types of inheritance, whether Mendelian or non-Mendelian, zygosity, sexual reproduction, and sex determination are covered. Functions of DNA and functional effects of variations are introduced, along with mechanism that reduce and modulate functional effects, including TARAR countermeasures and intraindividual genetic conflict. TARAR countermeasures for tolerance, avoidance, repair, attenuation, and resistance are essential for life, integrity of genetic information, and gene expression. The genetic composition, effects of variations, and their expression are considered also in diseases and personalized medicine. The text synthesizes knowledge and insight on individual genetic heterogeneity and organizes and systematizes the central concepts.


Subject(s)
Genetic Heterogeneity , Genome , Chromosomes , DNA , Reproduction/genetics
7.
Proc Natl Acad Sci U S A ; 120(1): e2207544120, 2023 Jan 03.
Article in English | MEDLINE | ID: covidwho-2186691

ABSTRACT

A growing body of work has addressed human adaptations to diverse environments using genomic data, but few studies have connected putatively selected alleles to phenotypes, much less among underrepresented populations such as Amerindians. Studies of natural selection and genotype-phenotype relationships in underrepresented populations hold potential to uncover previously undescribed loci underlying evolutionarily and biomedically relevant traits. Here, we worked with the Tsimane and the Moseten, two Amerindian populations inhabiting the Bolivian lowlands. We focused most intensively on the Tsimane, because long-term anthropological work with this group has shown that they have a high burden of both macro and microparasites, as well as minimal cardiometabolic disease or dementia. We therefore generated genome-wide genotype data for Tsimane individuals to study natural selection, and paired this with blood mRNA-seq as well as cardiometabolic and immune biomarker data generated from a larger sample that included both populations. In the Tsimane, we identified 21 regions that are candidates for selective sweeps, as well as 5 immune traits that show evidence for polygenic selection (e.g., C-reactive protein levels and the response to coronaviruses). Genes overlapping candidate regions were strongly enriched for known involvement in immune-related traits, such as abundance of lymphocytes and eosinophils. Importantly, we were also able to draw on extensive phenotype information for the Tsimane and Moseten and link five regions (containing PSD4, MUC21 and MUC22, TOX2, ANXA6, and ABCA1) with biomarkers of immune and metabolic function. Together, our work highlights the utility of pairing evolutionary analyses with anthropological and biomedical data to gain insight into the genetic basis of health-related traits.


Subject(s)
Genome , Genomics , Humans , Bolivia , Genotype , Phenotype , Biomarkers , Selection, Genetic , Polymorphism, Single Nucleotide
8.
Comput Biol Med ; 153: 106522, 2023 Feb.
Article in English | MEDLINE | ID: covidwho-2165197

ABSTRACT

The genomic substitution rate (GSR) of SARS-CoV-2 exhibits a molecular clock feature and does not change under fluctuating environmental factors such as the infected human population (10°-107), vaccination etc. The molecular clock feature is believed to be inconsistent with the selectionist theory (ST). The GSR shows lack of dependence on the effective population size, suggesting Ohta's nearly neutral theory (ONNT) is not applicable to this virus. Big variation of the substitution rate within its genome is also inconsistent with Kimura's neutral theory (KNT). Thus, all three existing evolution theories fail to explain the evolutionary nature of this virus. In this paper, we proposed a Segment Substitution Rate Model (SSRM) under non-neutral selections and pointed out that a balanced mechanism between negative and positive selection of some segments that could also lead to the molecular clock feature. We named this hybrid mechanism as near-neutral balanced selection theory (NNBST) and examined if it was followed by SARS-CoV-2 using the three independent sets of SARS-CoV-2 genomes selected by the Nextstrain team. Intriguingly, the relative substitution rate of this virus exhibited an L-shaped probability distribution consisting with NNBST rather than Poisson distribution predicted by KNT or an asymmetric distribution predicted by ONNT in which nearly neutral sites are believed to be slightly deleterious only, or the distribution that is lack of nearly neutral sites predicted by ST. The time-dependence of the substitution rates for some segments and their correlation with the vaccination were observed, supporting NNBST. Our relative substitution rate method provides a tool to resolve the long standing "neutralist-selectionist" controversy. Implications of NNBST in resolving Lewontin's Paradox is also discussed.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Mutation , SARS-CoV-2/genetics , COVID-19/genetics , Genome , Biological Evolution , Evolution, Molecular
9.
Gigascience ; 10(6)2021 06 29.
Article in English | MEDLINE | ID: covidwho-2161022

ABSTRACT

BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples. RESULTS: Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. CONCLUSIONS: Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc.


Subject(s)
Biological Specimen Banks , Genome-Wide Association Study , Genome , Humans , Phenotype , Polymorphism, Single Nucleotide , Quality Control , Software
10.
PeerJ ; 10: e14425, 2022.
Article in English | MEDLINE | ID: covidwho-2145069

ABSTRACT

The optimization of resources for research in developing countries forces us to consider strategies in the wet lab that allow the reuse of molecular biology reagents to reduce costs. In this study, we used linear regression as a method for predictive modeling of coverage depth given the number of MinION reads sequenced to define the optimum number of reads necessary to obtain >200X coverage depth with a good lineage-clade assignment of SARS-CoV-2 genomes. The research aimed to create and implement a model based on machine learning algorithms to predict different variables (e.g., coverage depth) given the number of MinION reads produced by Nanopore sequencing to maximize the yield of high-quality SARS-CoV-2 genomes, determine the best sequencing runtime, and to be able to reuse the flow cell with the remaining nanopores available for sequencing in a new run. The best accuracy was -0.98 according to the R squared performance metric of the models. A demo version is available at https://genomicdashboard.herokuapp.com/.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Sequence Analysis, DNA/methods , SARS-CoV-2/genetics , High-Throughput Nucleotide Sequencing/methods , Genome
11.
Sci Rep ; 12(1): 13971, 2022 08 17.
Article in English | MEDLINE | ID: covidwho-2016825

ABSTRACT

A comprehensive study of the properties of finite (0,1) binary systems from the mathematical viewpoint of quantum theory is presented. This is a quantum-inspired extension of the GenomeBits model to characterize observed genome sequences, where a complex wavefunction [Formula: see text] is considered as an analogous probability measure and it is related to an alternating (0,1) binary series having independent distributed terms. The real and imaginary spectrum of [Formula: see text] vs. the nucleotide base positions display characteristic features of sound waves. This approach represents a novel perspective for identifying and "observing" emergent properties of genome sequences in the form of wavefunctions via superposition states. The motivation is to develop a simple algorithm to perform wave calculations from binary sequences and to apply these wave functions to sonification.


Subject(s)
Algorithms , Quantum Theory , Genome
12.
Int J Mol Sci ; 23(17)2022 Aug 31.
Article in English | MEDLINE | ID: covidwho-2006048

ABSTRACT

A prolonged pandemic with numerous human casualties requires a rapid search for means to control the various strains of SARS-CoV-2. Since only part of the human population is affected by coronaviruses, there are probably endogenous compounds preventing the spread of these viral pathogens. It has been shown that piRNA (PIWI-interacting RNAs) interact with the mRNA of human genes and can block protein synthesis at the stage of translation. Estimated the effects of piRNA on SARS-CoV-2 genomic RNA (gRNA) in silico. A cluster of 13 piRNA binding sites (BS) in the SARS-CoV-2 gRNA region encoding the oligopeptide was identified. The second cluster of BSs 39 piRNAs also encodes the oligopeptide. The third cluster of 24 piRNA BS encodes the oligopeptide. Twelve piRNAs were identified that strongly interact with the gRNA. Based on the identified functionally important endogenous piRNAs, synthetic piRNAs (spiRNAs) are proposed that will suppress the multiplication of the coronavirus even more strongly. These spiRNAs and selected endogenous piRNAs have little effect on human 17494 protein-coding genes, indicating a low probability of side effects. The piRNA and spiRNA selection methodology created for the control of SARS-CoV-2 (NC_045512.2) can be used to control all strains of SARS-CoV-2.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/genetics , Genome , Humans , RNA, Guide, Kinetoplastida , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism , SARS-CoV-2/genetics
13.
Comput Biol Med ; 145: 105428, 2022 06.
Article in English | MEDLINE | ID: covidwho-1944670

ABSTRACT

COVID-19 presents a complex disease that needs to be addressed using systems medicine approaches that include genome-scale metabolic models (GEMs). Previous studies have used a single model extraction method (MEM) and/or a single transcriptomic dataset to reconstruct context-specific models, which proved to be insufficient for the broader biological contexts. We have applied four MEMs in combination with five COVID-19 datasets. Models produced by GIMME were separated by infection, while tINIT preserved the biological variability in the data and enabled the best prediction of the enrichment of metabolic subsystems. Vitamin D3 metabolism was predicted to be down-regulated in one dataset by GIMME, and in all by tINIT. Models generated by tINIT and GIMME predicted downregulation of retinol metabolism in different datasets, while downregulated cholesterol metabolism was predicted only by tINIT-generated models. Predictions are in line with the observations in COVID-19 patients. Our data indicated that GIMME and tINIT models provided the most biologically relevant results and should have a larger emphasis in further analyses. Particularly tINIT models identified the metabolic pathways that are a part of the host response and are potential antiviral targets. The code and the results of the analyses are available to download from https://github.com/CompBioLj/COVID_GEMs_and_MEMs.


Subject(s)
COVID-19 , COVID-19/genetics , Genome , Humans , Metabolic Networks and Pathways , Models, Biological , Transcriptome
14.
Comput Biol Med ; 147: 105756, 2022 08.
Article in English | MEDLINE | ID: covidwho-1930824

ABSTRACT

The rapid increase of metabolomics has led to an increasing focus on metabolic pathway modeling and reconstruction. In particular, reconstructing an organism's metabolic network based on its genome sequence is a key challenge in systems biology. The method used to address this problem predicts the presence or absence of metabolic pathways from known pathways in a reference database. However, this method is based on manual metabolic pathway construction and cannot be used for large genome sequencing data. To address such problems, we apply a supervised machine learning approach consisting of deep neural networks to learn feature representations of metabolic pathways and feed these representations into random forests to predict metabolic pathways. The supervised learning model, DeepRF, predicts all known and unknown metabolic pathways in an organism. Evaluation of DeepRF on over 318,016 instances shows that the model can predict metabolic pathways with high-performance metrics accuracy (>97%), recall (>95%), and precision (>99%). Comparing DeepRF with other methods in the literature shows that DeepRF produces more reliable results than other methods.


Subject(s)
Deep Learning , Databases, Factual , Genome , Metabolic Networks and Pathways/genetics , Neural Networks, Computer
15.
Int J Mol Sci ; 23(10)2022 May 12.
Article in English | MEDLINE | ID: covidwho-1875640

ABSTRACT

Viral infections can be fatal and consequently, they are a serious threat to human health. Therefore, the development of vaccines and appropriate antiviral therapeutic agents is essential. Depending on the virus, it can cause an acute or a chronic infection. The characteristics of viruses can act as inhibiting factors for the development of appropriate treatment methods. Genome editing technology, including the use of clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) proteins, zinc-finger nucleases (ZFNs), and transcription activator-like effector nucleases (TALENs), is a technology that can directly target and modify genomic sequences in almost all eukaryotic cells. The development of this technology has greatly expanded its applicability in life science research and gene therapy development. Research on the use of this technology to develop therapeutics for viral diseases is being conducted for various purposes, such as eliminating latent infections or providing resistance to new infections. In this review, we will look at the current status of the development of viral therapeutic agents using genome editing technology and discuss how this technology can be used as a new treatment approach for viral diseases.


Subject(s)
Gene Editing , Virus Diseases , Genome , Humans , Technology , Transcription Activator-Like Effector Nucleases/genetics , Virus Diseases/genetics , Virus Diseases/therapy
16.
Gigascience ; 112022 05 28.
Article in English | MEDLINE | ID: covidwho-1873910

ABSTRACT

BACKGROUND: The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was generated in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and greater continuity. FINDINGS: Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gb, similar to the 2.50-Gb length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity, with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein-coding genes and 10,459 noncoding genes are annotated in BCM_Maur_2.0 compared to 20,495 protein-coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where ∼17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0, in which the number of unresolved bases is reduced to 3.00%. CONCLUSIONS: Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.


Subject(s)
Chromosomes, Mammalian , Mesocricetus , Animals , Chromosomes, Mammalian/genetics , Genome , High-Throughput Nucleotide Sequencing/methods , Mesocricetus/genetics , Whole Genome Sequencing
17.
Gigascience ; 112022 05 18.
Article in English | MEDLINE | ID: covidwho-1853072

ABSTRACT

BACKGROUND: The masked palm civet (Paguma larvata) acts as an intermediate host of severe acute respiratory syndrome coronavirus (SARS-CoV), which caused SARS, and transfered this virus from bats to humans. Additionally, P. larvata has the potential to carry a variety of zoonotic viruses that may threaten human health. However, genome resources for P. larvata have not been reported to date. FINDINGS: A chromosome-level genome assembly of P. larvata was generated using PacBio sequencing, Illumina sequencing, and Hi-C technology. The genome assembly was 2.44 Gb in size, of which 95.32% could be grouped into 22 pseudochromosomes, with contig N50 and scaffold N50 values of 12.97 Mb and 111.81 Mb, respectively. A total of 21,582 protein-coding genes were predicted, and 95.20% of the predicted genes were functionally annotated. Phylogenetic analysis of 19 animal species confirmed the close genetic relationship between P. larvata and species belonging to the Felidae family. Gene family clustering revealed 119 unique, 243 significantly expanded, and 58 significantly contracted genes in the P. larvata genome. We identified 971 positively selected genes in P. larvata, and one known human viral receptor gene PDGFRA is positively selected in P. larvata, which is required for human cytomegalovirus infection. CONCLUSIONS: This high-quality genome assembly provides a valuable genomic resource for exploring virus-host interactions. It will also provide a reliable reference for studying the genetic bases of the morphologic characteristics, adaptive evolution, and evolutionary history of this species.


Subject(s)
Genome , Viverridae , Animals , Chromosomes , Genomics , Phylogeny , Viverridae/genetics
18.
BMC Bioinformatics ; 23(Suppl 3): 149, 2022 Apr 25.
Article in English | MEDLINE | ID: covidwho-1808338

ABSTRACT

BACKGROUND: The widely spreading coronavirus disease (COVID-19) has three major spreading properties: pathogenic mutations, spatial, and temporal propagation patterns. We know the spread of the virus geographically and temporally in terms of statistics, i.e., the number of patients. However, we are yet to understand the spread at the level of individual patients. As of March 2021, COVID-19 is wide-spread all over the world with new genetic variants. One important question is to track the early spreading patterns of COVID-19 until the virus has got spread all over the world. RESULTS: In this work, we proposed AutoCoV, a deep learning method with multiple loss object, that can track the early spread of COVID-19 in terms of spatial and temporal patterns until the disease is fully spread over the world in July 2020. Performances in learning spatial or temporal patterns were measured with two clustering measures and one classification measure. For annotated SARS-CoV-2 sequences from the National Center for Biotechnology Information (NCBI), AutoCoV outperformed seven baseline methods in our experiments for learning either spatial or temporal patterns. For spatial patterns, AutoCoV had at least 1.7-fold higher clustering performances and an F1 score of 88.1%. For temporal patterns, AutoCoV had at least 1.6-fold higher clustering performances and an F1 score of 76.1%. Furthermore, AutoCoV demonstrated the robustness of the embedding space with an independent dataset, Global Initiative for Sharing All Influenza Data (GISAID). CONCLUSIONS: In summary, AutoCoV learns geographic and temporal spreading patterns successfully in experiments on NCBI and GISAID datasets and is the first of its kind that learns virus spreading patterns from the genome sequences, to the best of our knowledge. We expect that this type of embedding method will be helpful in characterizing fast-evolving pandemics.


Subject(s)
COVID-19 , Deep Learning , COVID-19/epidemiology , Genome , Humans , Pandemics , SARS-CoV-2
19.
Mamm Genome ; 33(1): 143-156, 2022 03.
Article in English | MEDLINE | ID: covidwho-1767484

ABSTRACT

Mouse models are essential for dissecting disease mechanisms and defining potential drug targets. There are more than 18,500 mouse strains available for research communities in National Resource Center for Mutant Mice (NRCMM) of China, affiliated with Model Animal Research Center of Nanjing University and Gempharmatech Company. In 2019, Gempharmatech launched the Knockout All Project (KOAP) aiming to generate null mutants and gene floxed strains for all protein-coding genes in mouse genome within 5 years. So far, KOAP has generated 8,004 floxed strains and 9,769 KO (knockout) strains (updated to Oct, 2021). NRCMM also created hundreds of Cre transgenic lines, mutant knock-in models, immuno-deficient models, and humanized mouse models. As a member of the international mouse phenotyping consortium (IMPC), NRCMM provides comprehensive phenotyping services for mouse models. In summary, NRCMM will continue to support biomedical community with new mouse models as well as related services.


Subject(s)
Genome , Animals , China , Disease Models, Animal , Humans , Mice , Mice, Knockout , Phenotype
20.
Per Med ; 19(3): 229-250, 2022 05.
Article in English | MEDLINE | ID: covidwho-1736673

ABSTRACT

Aim: A human immunogenetics variation study was conducted in samples collected from diverse COVID-19 populations. Materials & methods: Whole-genome and whole-exome sequencing (WGS/WES), data processing, analysis and visualization pipeline were applied to identify variants associated with genes of interest. Results: A total of 2886 mutations were found across the entire set of 13 genomes. Functional annotation of the gene variants revealed mutation type and protein change. Many variants were found to be biologically implicated in COVID-19. The involvement of these genes was also found in multiple other diseases. Conclusion: The analysis determined that ACE2, TMPRSS4, TMPRSS2, SLC6A20 and FYCOI had functional implications and TMPRSS4 was the gene most altered in virally infected patients.


The quest to establish an understanding of the genetics underlying COVID-19 is a central focus of life sciences today. COVID-19 is triggered by SARS-CoV-2, a single-stranded RNA respiratory virus. Several clinical-genomics studies have emerged positing different human gene mutations occurring due to COVID-19. A global analysis of these genes was conducted targeting major components of the immune system to identify possible variations likely to be involved in COVID-19 predisposition. Gene-variant analysis was performed on whole-genome sequencing samples collected from diverse populations. ACE2, TMPRSS4, TMPRSS2, SLC6A20 and FYCOI were found to have functional implications and TMPRSS4 may have a role in the severity of clinical manifestations of COVID-19.


Subject(s)
COVID-19 , Angiotensin-Converting Enzyme 2/genetics , COVID-19/genetics , Genome , Humans , Membrane Transport Proteins/genetics , SARS-CoV-2/genetics , Exome Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL